A What-if Engine for Cost-based MapReduce Optimization

نویسندگان

  • Herodotos Herodotou
  • Shivnath Babu
چکیده

The Starfish project at Duke University aims to provide MapReduce users and applications with good performance automatically, without any need on their part to understand and manipulate the numerous tuning knobs in a MapReduce system. This paper describes the What-if Engine, an indispensable component in Starfish, which serves a similar purpose as a costing engine used by the query optimizer in a Database system. We discuss the problem and challenges addressed by the What-if Engine. We also discuss the techniques used by the What-if Engine and the design decisions that led us to these techniques.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Profiling, What-if Analysis, and Cost-based Optimization of MapReduce Programs

MapReduce has emerged as a viable competitor to database systems in big data analytics. MapReduce programs are being written for a wide variety of application domains including business data processing, text analysis, natural language processing, Web graph and social network analysis, and computational science. However, MapReduce systems lack a feature that has been key to the historical succes...

متن کامل

MapReduce Programming and Cost-based Optimization? Crossing this Chasm with Starfish

MapReduce has emerged as a viable competitor to database systems in big data analytics. MapReduce programs are being written for a wide variety of application domains including business data processing, text analysis, natural language processing, Web graph and social network analysis, and computational science. However, MapReduce systems lack a feature that has been key to the historical succes...

متن کامل

Cost Based Multi-Way Equi-Join Optimization in MapReduce

MapReduce is a prominent programming model above shared nothing architecture for processing big data with a parallel, distributed algorithm on a cluster. Join is an important operation is very inefficient in MapReduce. In this work, a time cost based evolution model is proposed for multi-way join by considering the time cost calculation. A multi-way join consists of start pattern joins and chai...

متن کامل

Traffic Analysis in MapReduce

-MapReduce is a programming model, which can process the large set of data and produces the output. The MapReduce contains two functions to complete the work, those are Map function and Reduce function. The Map function will get assign fragmented data as input and then its emit intermediate data with key and send to this intermediate data with key to the Reducer, where Reducer will get the inpu...

متن کامل

Stubby: A Transformation-based Optimizer for MapReduce Workflows

There is a growing trend of performing analysis on large datasets using workflows composed of MapReduce jobs connected through producer-consumer relationships based on data. This trend has spurred the development of a number of interfaces—ranging from program-based to query-based interfaces—for generating MapReduce workflows. Studies have shown that the gap in performance can be quite large bet...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:
  • IEEE Data Eng. Bull.

دوره 36  شماره 

صفحات  -

تاریخ انتشار 2013